{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Practical Guide to Improving Performance: Optimizing For Throughput"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook serves as a practical guide to demonstrate how you can tune the performance of your model on Tenstorrent hardware by increasing the batch size of inputs. It will also demonstrate the appropriate way of benchmarking models on AI hardware by separating the compilation time from the run time.\n",
    "\n",
    "The tutorial will walk through an example of running the [BERT](https://en.wikipedia.org/wiki/BERT_(language_model)) model on Tenstorrent AI accelerator hardware. The model weights will be directly downloaded from the [HuggingFace library](https://huggingface.co/docs/transformers/model_doc/bert) and executed through the PyBUDA SDK."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Guide Overview\n",
    "\n",
    "In this guide, we will talk through the steps for running the BERT model trained on the [SST2](https://nlp.stanford.edu/sentiment/index.html) dataset for the **Text Classification** task.\n",
    "\n",
    "You will learn how to vary the input batch size of the model to achieve higher throughput performance. You will also learn how to configure a benchmark framework for evaluating the model performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Import libraries\n",
    "\n",
    "Make sure that you have an activate Python environment with the latest version of PyBUDA installed.\n",
    "\n",
    "We will start by first pip installing the `evaluate` library which will be used to calculate the accuracy metric."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install a pip package in the current Jupyter kernel\n",
    "import sys\n",
    "!{sys.executable} -m pip install evaluate==0.4.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the pybuda library and additional libraries required for this tutorial\n",
    "import time\n",
    "from typing import Any, Dict, List, Tuple\n",
    "\n",
    "import evaluate\n",
    "import pybuda\n",
    "import torch\n",
    "from datasets import load_dataset\n",
    "from torch.utils.data import DataLoader, Dataset\n",
    "from transformers import BertForSequenceClassification, BertTokenizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Create helper classes and functions\n",
    "\n",
    "We will create some helper classes and functions to improve code reusability throughout this tutorial.\n",
    "\n",
    "* `SST2Dataset` -- Python Class to hold a preprocessed version of the SST2 dataset used for evaluation\n",
    "* `eval_fn` -- function to compute the evaluation score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a Dataset Class to preprocess the data\n",
    "class SST2Dataset(Dataset):\n",
    "    \"\"\"Configurable SST-2 Dataset.\"\"\"\n",
    "\n",
    "    def __init__(self, dataset: Any, tokenizer: Any, split: str, seq_len: int):\n",
    "        \"\"\"\n",
    "        Init and preprocess SST-2 dataset.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        dataset : Any\n",
    "            SST-2 dataset\n",
    "        tokenizer : Any\n",
    "            tokenizer object from HuggingFace\n",
    "        split : str\n",
    "            Which split to use i.e. [\"train\", \"validation\", \"test\"]\n",
    "        seq_len : int\n",
    "            Sequence length\n",
    "        \"\"\"\n",
    "        self.sst2 = dataset[split]\n",
    "        self.data = [\n",
    "            (\n",
    "                tokenizer(\n",
    "                    item[\"sentence\"],\n",
    "                    return_tensors=\"pt\",\n",
    "                    max_length=seq_len,\n",
    "                    padding=\"max_length\",\n",
    "                    return_token_type_ids=False,\n",
    "                    truncation=True,\n",
    "                ),\n",
    "                item[\"label\"],\n",
    "            )\n",
    "            for item in self.sst2\n",
    "        ]\n",
    "\n",
    "        for data in self.data:\n",
    "            tokenized = data[0]\n",
    "            for item in tokenized:\n",
    "                tokenized[item] = tokenized[item].squeeze()\n",
    "\n",
    "    def __len__(self) -> int:\n",
    "        \"\"\"\n",
    "        Return length of dataset.\n",
    "\n",
    "        Returns\n",
    "        -------\n",
    "        int\n",
    "            Length of dataset\n",
    "        \"\"\"\n",
    "        return len(self.data)\n",
    "\n",
    "    def __getitem__(self, index: int) -> Tuple[Dict[str, torch.Tensor], int]:\n",
    "        \"\"\"\n",
    "        Return sample from dataset.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        index : int\n",
    "            Index of sample\n",
    "\n",
    "        Returns\n",
    "        -------\n",
    "        Tuple\n",
    "            Data sample in format of X, y\n",
    "        \"\"\"\n",
    "        X, y = self.data[index]\n",
    "        return X, y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define evaluation function\n",
    "def eval_fn(outputs: List[torch.tensor], labels: List[int], metric_type: str) -> float:\n",
    "    \"\"\"\n",
    "    Evaluation function for measuring model accuracy.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    outputs : List[torch.tensor]\n",
    "        Predicted outputs from model\n",
    "    labels : List[int]\n",
    "        List of true labels\n",
    "    metric_type : str\n",
    "        Type of metric to return i.e. accuracy, recall, precision, etc.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    float\n",
    "        Evaluation score.\n",
    "    \"\"\"\n",
    "\n",
    "    # set evaluation metric for dataset\n",
    "    accuracy_metric = evaluate.load(metric_type)\n",
    "\n",
    "    # initialize lists to store predictions and labels\n",
    "    pred_labels = []\n",
    "    true_labels = []\n",
    "\n",
    "    # store all predictions\n",
    "    for output in outputs:\n",
    "        pred_labels.extend(torch.argmax(output, axis=-1))\n",
    "\n",
    "    # store all labels\n",
    "    for label in labels:\n",
    "        true_labels.extend(label)\n",
    "\n",
    "    # compute the accuracy\n",
    "    eval_score = accuracy_metric.compute(references=true_labels, predictions=pred_labels)\n",
    "\n",
    "    return eval_score[metric_type]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Download the model weights from HuggingFace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load BERT tokenizer and model from HuggingFace for text classification task\n",
    "model_ckpt = \"textattack/bert-base-uncased-SST-2\"\n",
    "tokenizer = BertTokenizer.from_pretrained(model_ckpt)\n",
    "model = BertForSequenceClassification.from_pretrained(model_ckpt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Set optimal configurations\n",
    "\n",
    "For every model, you can adjust TT-BUDA configuration parameters to achieve optimized performance. Some key parameters include:\n",
    "\n",
    "* Data format e.g. BFP8, FP16_b, FP16, etc.\n",
    "* Math fidelity\n",
    "* Balancer policy\n",
    "* etc...\n",
    "\n",
    "For a full list of tuneable parameters, please refer to the TT-BUDA documentation: <https://docs.tenstorrent.com/tenstorrent/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set optimal configurations\n",
    "compiler_cfg = pybuda.config._get_global_compiler_config()\n",
    "compiler_cfg.default_df_override = pybuda._C.DataFormat.Float16_b\n",
    "compiler_cfg.enable_auto_transposing_placement = True\n",
    "compiler_cfg.balancer_policy = \"Ribbon\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Instantiate Tenstorrent device\n",
    "\n",
    "The first time we use PyBUDA, we must initialize a `TTDevice` object which serves as the abstraction over the target hardware."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tt0 = pybuda.TTDevice(\n",
    "    name=\"tt_device_0\",  # here we can give our device any name we wish, for tracking purposes\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6: Create a PyBUDA module from PyTorch model\n",
    "\n",
    "Next, we must abstract the PyTorch model loaded from HuggingFace into a `pybuda.PyTorchModule` object. This will let the BUDA compiler know which model architecture and AI framework it has to compile.\n",
    "\n",
    "We then \"place\" this module onto the previously initialized `TTDevice`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create module\n",
    "pybuda_module = pybuda.PyTorchModule(\n",
    "    name = \"pt_bert_text_classification\",  # give the module a name, this will be used for tracking purposes\n",
    "    module=model  # specify the model that is being targeted for compilation\n",
    ")\n",
    "\n",
    "# Place module on device\n",
    "tt0.place_module(module=pybuda_module)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7: Load the SST2 dataset for evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = SST2Dataset(dataset=load_dataset(\"glue\", \"sst2\"), tokenizer=tokenizer, split=\"validation\", seq_len=128)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 8: Set the batch size, prep the dataset, and load a sample input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set batch size\n",
    "batch_size = 64\n",
    "\n",
    "# prepare the dataset for specified batch size\n",
    "generator = DataLoader(dataset, batch_size=batch_size, shuffle=False, drop_last=True)\n",
    "\n",
    "# get sample input\n",
    "sample_input, _ = next(iter(generator))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 9: Compile the model with fixed batch size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "start_compilation_time = time.time()\n",
    "output_q = pybuda.initialize_pipeline(training=False, sample_inputs=list(sample_input.values()))\n",
    "end_compilation_time = time.time()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 10: Run benchmark on SST2 dataset with `batch_size==64`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run benchmark loop\n",
    "store_outputs = []\n",
    "store_labels = []\n",
    "start_runtime_time = time.time()\n",
    "for batch, labels in generator:\n",
    "    # push input to Tenstorrent device\n",
    "    tt0.push_to_inputs(batch)\n",
    "\n",
    "    # run inference on Tenstorrent device\n",
    "    pybuda.run_forward(input_count=1)\n",
    "    output = output_q.get()  # inference will return a queue object, get last returned object\n",
    "\n",
    "    # store outputs\n",
    "    store_labels.append(labels)\n",
    "    store_outputs.append(output[0].value())\n",
    "end_runtime_time = time.time()\n",
    "\n",
    "# Process output times\n",
    "total_runtime_time = end_runtime_time - start_runtime_time\n",
    "total_compilation_time = end_compilation_time - start_compilation_time\n",
    "total_samples = len(generator) *  batch_size\n",
    "eval_score = eval_fn(store_outputs, store_labels, \"accuracy\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display results\n",
    "print(\"Benchmark Result\")\n",
    "print(f\" Model compilation time: {total_compilation_time:.3f}s\")\n",
    "print(f\" Total runtime time for {total_samples} inputs: {total_runtime_time:.3f}s\")\n",
    "print(f\" Throughput: {(total_samples / total_runtime_time):.1f} samples/s\")\n",
    "print(f\" Accuracy: {(eval_score * 100):.1f}%\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 11: Shutdown PyBuda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pybuda.shutdown()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
